1 Introduction

The talk accompanying this material can be found here

1.1 Package Introduction

extraChIPs was developed to provide key infrastructure for a larger ChIP-Seq workflow comparing binding dynamics across multiple ChIP targets. The focus is on:

  • Differential Signal Detection
  • Comparison between ChIP targets or sets of results
  • Working with GRanges objects containing important values in the mcols() element
  • Visualisation of Data and Results

The package was developed to utilise existing S4 Object Classes and behave in a tidy-friendly manner, whilst not being strictly designed under tidy programming guidelines. Some functions may be analogous to those in existing packages, but the aim is to be simple and intuitive to use, whilst simplifying high-performance analyses. Previous work by the authors of DiffBind (Ross-Innes et al. 2012) and csaw (Lun and Smyth 2016) has been instrumental in the package design, along with the tidyverse (Wickham et al. 2019) and edgeR (Chen, Lun, and Smyth 2016). The intention of the package is to integrate easily with these packages, wrap multiple functions where appropriate, and provide new tools for both visualisation and working with results.

1.2 Data and Workshop Setup

Today’s workshop will focus on the devel version of the package, which can be installed using

BiocManager::install(
  "extraChIPs", version = "devel", update = FALSE, ask = FALSE
)

Alternatively, this can be installed directly from github

BiocManager::install(
  "smped/extraChIPs", ref = "devel", update = FALSE, ask = FALSE
)

Additional packages required can be installed as follows:

pkg <- c(
  "rtracklayer", "tidyverse", "glue", "plyranges", "Rsamtools", "csaw", 
  "edgeR","patchwork"
)
BiocManager::install(pkgs = pkg, update = FALSE, ask = FALSE)
library(tidyverse)
library(glue)
library(plyranges)
library(extraChIPs)
library(Rsamtools)
library(csaw)
library(edgeR)
library(rtracklayer)
library(patchwork)
theme_set(theme_bw())

Data can be obtained from here and should be placed in a folder called data within your current working directory. This data has been pre-processed here using the prepareChIPs workflow (Pederson 2023), and subset to a region on Chromosome 10 for easier working in today’s context.

The two targets in this dataset are the Estrogen Receptor (ER\(\alpha\)) and the histone mark H3K27ac. Data has been taken from ZR-75-1 cells cultured in Estrogen (E2) or Estrogen + Dihydrotestosterone (E2DHT), and was first published by researchers from the Dame Roma Mitchell Cancer Research Laboratories (Hickey et al. 2021)

The sample sheet for the ER\(\alpha\) samples can be setup by copying the following code. As can be seen, we have three samples from the E2 (baseline) group with three more treated by the addition of DHT.

targets <- c("ER", "H3K27ac")
treat_levels <- c("E2", "E2DHT")
samples <- tibble(
  accession = paste0("SRR83151", 80:91),
  target = rep(targets, each = 6),
  treatment = treat_levels %>% 
    rep(each = 3) %>% 
    rep_len(12) %>% 
    factor(levels = treat_levels),
  input = "SRR8315192"
)
## Split for easier wrangling
samples <- split(samples, samples$target)
samples
## $ER
## # A tibble: 6 × 4
##   accession  target treatment input     
##   <chr>      <chr>  <fct>     <chr>     
## 1 SRR8315180 ER     E2        SRR8315192
## 2 SRR8315181 ER     E2        SRR8315192
## 3 SRR8315182 ER     E2        SRR8315192
## 4 SRR8315183 ER     E2DHT     SRR8315192
## 5 SRR8315184 ER     E2DHT     SRR8315192
## 6 SRR8315185 ER     E2DHT     SRR8315192
## 
## $H3K27ac
## # A tibble: 6 × 4
##   accession  target  treatment input     
##   <chr>      <chr>   <fct>     <chr>     
## 1 SRR8315186 H3K27ac E2        SRR8315192
## 2 SRR8315187 H3K27ac E2        SRR8315192
## 3 SRR8315188 H3K27ac E2        SRR8315192
## 4 SRR8315189 H3K27ac E2DHT     SRR8315192
## 5 SRR8315190 H3K27ac E2DHT     SRR8315192
## 6 SRR8315191 H3K27ac E2DHT     SRR8315192

We’ll also setup colours for each treatment for consistent visualisations throughout the analysis

treat_colours <- setNames(c("steelblue", "red3"), treat_levels)

Finally, we’ll define a Seqinfo object just for today’s data.

sq <- Seqinfo(
  seqnames = "chr10", seqlengths = 135534747, isCircular = FALSE, 
  genome = "hg19"
)

2 Core Infrastructure

2.1 *MC() Functions

Many functions exist for combining sets of ranges, such as reduce(), intersect(), setdiff() etc, however these have been extended in extraChIPs to reduceMC(), intersectMC(), setdiffMC() and unionMC() as well as tidy-aligned functions chopMC() and distinctMC(). This group of functions behaves identically to the core GenomicRanges versions, but retains data in the mcols element. These are used internally by many functions within the package, but also have many stand-alone uses.

Define a simple set of ranges, with ids in the mcols

x <- GRanges(c("chr1:1-10", "chr1:6-12"))
x$id <- c("range1", "range2")
x$gene <- "gene1"
x
## GRanges object with 2 ranges and 2 metadata columns:
##       seqnames    ranges strand |          id        gene
##          <Rle> <IRanges>  <Rle> | <character> <character>
##   [1]     chr1      1-10      * |      range1       gene1
##   [2]     chr1      6-12      * |      range2       gene1
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

When calling reduce() we lose all mcols information.

reduce(x)
## GRanges object with 1 range and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chr1      1-12      *
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

These are retained when calling reduceMC

reduceMC(x)
## GRanges object with 1 range and 2 metadata columns:
##       seqnames    ranges strand |              id        gene
##          <Rle> <IRanges>  <Rle> | <CharacterList> <character>
##   [1]     chr1      1-12      * |   range1,range2       gene1
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

All other functions behave in a similar manner

y <- GRanges(c("chr1:11-15"))
intersect(x, y)
## GRanges object with 1 range and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chr1     11-12      *
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths
intersectMC(x, y)
## GRanges object with 1 range and 2 metadata columns:
##       seqnames    ranges strand |          id        gene
##          <Rle> <IRanges>  <Rle> | <character> <character>
##   [1]     chr1     11-12      * |      range2       gene1
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths
setdiff(x, y)
## GRanges object with 1 range and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chr1      1-10      *
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths
setdiffMC(x, y)
## GRanges object with 1 range and 2 metadata columns:
##       seqnames    ranges strand |              id        gene
##          <Rle> <IRanges>  <Rle> | <CharacterList> <character>
##   [1]     chr1      1-10      * |   range1,range2       gene1
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths
union(x, y)
## GRanges object with 1 range and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chr1      1-15      *
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths
unionMC(x, y)
## GRanges object with 1 range and 2 metadata columns:
##       seqnames    ranges strand |              id        gene
##          <Rle> <IRanges>  <Rle> | <CharacterList> <character>
##   [1]     chr1      1-15      * |   range1,range2       gene1
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

2.2 tibble Coercion

As this package is also designed to play well with the tidyverse the functions as_tibble() and colToRanges() have been introduced. When coercing to a tibble, the GRanges element is coerced to a character vector.

tbl <- as_tibble(x)
tbl
## # A tibble: 2 × 3
##   range     id     gene 
##   <chr>     <chr>  <chr>
## 1 chr1:1-10 range1 gene1
## 2 chr1:6-12 range2 gene1

The tibble can be returned to the more conventional set of columns by setting rangeAsChar = FALSE

as_tibble(x, rangeAsChar = FALSE)
## # A tibble: 2 × 7
##   seqnames start   end width strand id     gene 
##   <fct>    <int> <int> <int> <fct>  <chr>  <chr>
## 1 chr1         1    10    10 *      range1 gene1
## 2 chr1         6    12     7 *      range2 gene1

Additional methods have been implemented for easy coercion of DataFrame, GInteractions, Seqinfo, SummarzedExperiment and TopTags objects

2.3 colToRanges

Coercion is implemented from tibble to GRanges using colToRanges()

colToRanges(tbl, var = "range")
## GRanges object with 2 ranges and 2 metadata columns:
##       seqnames    ranges strand |          id        gene
##          <Rle> <IRanges>  <Rle> | <character> <character>
##   [1]     chr1      1-10      * |      range1       gene1
##   [2]     chr1      6-12      * |      range2       gene1
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths
colToRanges(tbl, var = "range", seqinfo = seqinfo(x))
## GRanges object with 2 ranges and 2 metadata columns:
##       seqnames    ranges strand |          id        gene
##          <Rle> <IRanges>  <Rle> | <character> <character>
##   [1]     chr1      1-10      * |      range1       gene1
##   [2]     chr1      6-12      * |      range2       gene1
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

Whilst the seqinfo is clearly lost in the reverse coercion, colToRanges() a seqinfo object can be provided.

colToRanges can also move GRanges objects in mcols to the ‘backbone’ ranges. Here we’ll add a column which contains the peak centre to our original set of ranges, then we’ll move that to become the ‘backbone’ ranges.

x$centre <- GRanges(paste0("chr1:", c(5, 10)), seqinfo = seqinfo(x))
x
## GRanges object with 2 ranges and 3 metadata columns:
##       seqnames    ranges strand |          id        gene    centre
##          <Rle> <IRanges>  <Rle> | <character> <character> <GRanges>
##   [1]     chr1      1-10      * |      range1       gene1    chr1:5
##   [2]     chr1      6-12      * |      range2       gene1   chr1:10
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths
colToRanges(x, var = "centre")
## GRanges object with 2 ranges and 2 metadata columns:
##       seqnames    ranges strand |          id        gene
##          <Rle> <IRanges>  <Rle> | <character> <character>
##   [1]     chr1         5      * |      range1       gene1
##   [2]     chr1        10      * |      range2       gene1
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

3 Working With Peaks

3.1 Import and Visualisation

Broad and Narrow Peak files as output by peak callers such as macs2 callpeak (Zhang et al. 2008) can be imported using importPeaks(). This will always return a GRangesList for simple downstream analysis. If importing narrowPeak files, the peak centre can be automatically added.

fl <- glue("data/ER/{samples$ER$accession}_peaks.narrowPeak")
peaks <- importPeaks(fl, seqinfo = sq, centre = TRUE)
names(peaks) <- str_remove_all(names(peaks), "_peaks.narrowPeak")

(Names can also be set using glue syntax, however the above is generally more easily comprehended at first glance.)

Blacklisted regions and grey-listed regions are also able to be applied during import. Blacklisted regions are usually available from public sources, whilst grey-listed regions are derived from a relevant input.

These have also been provided and can be loaded separately then combined.

bg_list <- file.path(
  "data", "annotations", c("chr10_blacklist.bed", "chr10_greylist.bed")
) %>% 
  lapply(import.bed, seqinfo = sq) %>% 
  lapply(granges) %>% 
  GRangesList() %>% 
  unlist() %>% 
  sort()
peaks <- importPeaks(fl, seqinfo = sq, centre = TRUE, blacklist = bg_list)
names(peaks) <- str_remove_all(names(peaks), "_peaks.narrowPeak")

Now we’ve loaded our peaks and excluded peaks within unreliable ranges, one of the first tasks might be to check how similar our replicates are. This can easily be performed using plotOverlaps()

plotOverlaps(peaks, set_col = treat_colours[samples$ER$treatment])
Overlapping peaks between all replicates

Figure 1: Overlapping peaks between all replicates

plotOverlaps() calls the plotting functions provided by ComplexUpset::upset() (Krassowski 2020) and as such, can handle any additional parameters which are understood by this function.

plotOverlaps(
  peaks, .sort_sets = FALSE, 
  set_col = treat_colours[samples$ER$treatment], n_intersections = 10
)
Overlapping peaks, showing only intersections with 10 or more members and retaining the original sample ordering

Figure 2: Overlapping peaks, showing only intersections with 10 or more members and retaining the original sample ordering

3.2 Consensus Peaks

The next step might be to form a set of consensus peaks for each treatment. Just forming a set of consensus peaks for the E2 samples can be done manually

makeConsensus(peaks[1:3])
## GRanges object with 244 ranges and 4 metadata columns:
##         seqnames            ranges strand | SRR8315180 SRR8315181 SRR8315182
##            <Rle>         <IRanges>  <Rle> |  <logical>  <logical>  <logical>
##     [1]    chr10 43048195-43048529      * |       TRUE       TRUE       TRUE
##     [2]    chr10 43521739-43522260      * |       TRUE       TRUE       TRUE
##     [3]    chr10 43540042-43540390      * |       TRUE      FALSE       TRUE
##     [4]    chr10 43606238-43606573      * |       TRUE       TRUE       TRUE
##     [5]    chr10 43851214-43851989      * |      FALSE       TRUE       TRUE
##     ...      ...               ...    ... .        ...        ...        ...
##   [240]    chr10 99168353-99168649      * |       TRUE       TRUE       TRUE
##   [241]    chr10 99207868-99208156      * |      FALSE       TRUE       TRUE
##   [242]    chr10 99326100-99326363      * |      FALSE      FALSE       TRUE
##   [243]    chr10 99331363-99331730      * |       TRUE       TRUE       TRUE
##   [244]    chr10 99621632-99621961      * |      FALSE       TRUE       TRUE
##                 n
##         <numeric>
##     [1]         3
##     [2]         3
##     [3]         2
##     [4]         3
##     [5]         2
##     ...       ...
##   [240]         3
##   [241]         2
##   [242]         1
##   [243]         3
##   [244]         2
##   -------
##   seqinfo: 1 sequence from hg19 genome

In reality, we may prefer to only retain peaks found in 2/3 samples and we can do this by setting p = 2/3. We can also also makeConsensus to retain one or more columns from the original peaks using var =

makeConsensus(peaks[1:3], p = 2/3, var = "centre")
## GRanges object with 164 ranges and 5 metadata columns:
##         seqnames            ranges strand |                     centre
##            <Rle>         <IRanges>  <Rle> |              <NumericList>
##     [1]    chr10 43048195-43048529      * | 43048341,43048357,43048389
##     [2]    chr10 43521739-43522260      * | 43522008,43522012,43522039
##     [3]    chr10 43540042-43540390      * |          43540298,43540245
##     [4]    chr10 43606238-43606573      * | 43606418,43606408,43606421
##     [5]    chr10 43851214-43851989      * |          43851672,43851766
##     ...      ...               ...    ... .                        ...
##   [160]    chr10 99096784-99097428      * | 99097259,99097253,99097249
##   [161]    chr10 99168353-99168649      * | 99168506,99168486,99168513
##   [162]    chr10 99207868-99208156      * |          99207995,99208002
##   [163]    chr10 99331363-99331730      * | 99331621,99331585,99331580
##   [164]    chr10 99621632-99621961      * |          99621809,99621828
##         SRR8315180 SRR8315181 SRR8315182         n
##          <logical>  <logical>  <logical> <numeric>
##     [1]       TRUE       TRUE       TRUE         3
##     [2]       TRUE       TRUE       TRUE         3
##     [3]       TRUE      FALSE       TRUE         2
##     [4]       TRUE       TRUE       TRUE         3
##     [5]      FALSE       TRUE       TRUE         2
##     ...        ...        ...        ...       ...
##   [160]       TRUE       TRUE       TRUE         3
##   [161]       TRUE       TRUE       TRUE         3
##   [162]      FALSE       TRUE       TRUE         2
##   [163]       TRUE       TRUE       TRUE         3
##   [164]      FALSE       TRUE       TRUE         2
##   -------
##   seqinfo: 1 sequence from hg19 genome

By default, this takes the union of the ranges called as peaks in each sample. We may instead prefer to just retain regions covered in \(\geq\) 2/3 samples. Not that whilst we have the same number of ranges in this particular example, the ranges are tighter.

makeConsensus(peaks[1:3], p = 2/3, method = "coverage")
## GRanges object with 164 ranges and 4 metadata columns:
##         seqnames            ranges strand | SRR8315180 SRR8315181 SRR8315182
##            <Rle>         <IRanges>  <Rle> |  <logical>  <logical>  <logical>
##     [1]    chr10 43048224-43048519      * |       TRUE       TRUE       TRUE
##     [2]    chr10 43521773-43522218      * |       TRUE       TRUE       TRUE
##     [3]    chr10 43540200-43540384      * |       TRUE      FALSE       TRUE
##     [4]    chr10 43606264-43606559      * |       TRUE       TRUE       TRUE
##     [5]    chr10 43851570-43851940      * |      FALSE       TRUE       TRUE
##     ...      ...               ...    ... .        ...        ...        ...
##   [160]    chr10 99097038-99097398      * |       TRUE       TRUE       TRUE
##   [161]    chr10 99168367-99168608      * |       TRUE       TRUE       TRUE
##   [162]    chr10 99207908-99208139      * |      FALSE       TRUE       TRUE
##   [163]    chr10 99331412-99331687      * |       TRUE       TRUE       TRUE
##   [164]    chr10 99621674-99621944      * |      FALSE       TRUE       TRUE
##                 n
##         <numeric>
##     [1]         3
##     [2]         3
##     [3]         2
##     [4]         3
##     [5]         2
##     ...       ...
##   [160]         3
##   [161]         3
##   [162]         2
##   [163]         3
##   [164]         2
##   -------
##   seqinfo: 1 sequence from hg19 genome

In order to make consensus peaks for each condition, we can split by treatment group, then use a simple lapply() trick, to work with both treatment groups together.

cons_peaks <- peaks %>% 
  split(f = samples$ER$treatment) %>% 
  lapply(makeConsensus, p = 2/3, var = "centre") %>% 
  lapply(select, centre) %>% 
  lapply(mutate, centre = vapply(centre, mean, numeric(1))) %>% 
  GRangesList()

A lot happened in that code chunk, so let’s break it down

  1. We split samples based on the treatment (split(f = samples$ER$treatment)), then
  2. Identified consensus peaks within each treatment found in 2/3 samples, retaining the peak centre (lapply(makeConsensus, p = 2/3, var = "centre"))
  3. Subset the mcols to only retain the peak centre (lapply(select, centre)).
  4. The previous step returned centre estimates as a NumericList, so we took the mean of these for each consensus peak
  5. We coerced to a GRangeList

Now we can compare our two treatment conditions using plotOverlaps() again, which will return a Venn Diagram when passing a GRangesList of length 2.

plotOverlaps(cons_peaks, set_col = treat_colours)
Overlaps betwen consensus peaks specific to each treatment group

Figure 3: Overlaps betwen consensus peaks specific to each treatment group

Finally, we can form a set of “union” peaks, defined as those within a consensus peak for either treatment group.

union_peaks <- makeConsensus(cons_peaks, var = "centre") %>% 
  mutate(centre = vapply(centre, mean, numeric(1)) %>% round(0))

3.3 Mapping Peaks to Regulatory Regions

Now we have detected our ranges where ER\(\alpha\) appears to have bound, we might wish to find which genes are likely targets and what type of genomic feature is associated with each range. First let’s define some genomic regions.

A Gencode GTF has been provided in data, restricted to the regions of chr10 we’re working with today. We can load this then, split based on the feature type, giving a GRangesList with gene, transcript and exon-level features.

gtf <- file.path(
  "data", "annotations", "gencode.v43lift37.chr10.annotation.gtf.gz"
) %>% 
  import.gff() %>% 
  split(.$type)
seqinfo(gtf) <- sq

The function defineRegions() enables us to define 1) Promoter, 2) Upstream Promoter, 3) Intron, 4) Exon, 5) Proximal Intergenic and 6) Distal Intergenic Regions. The distances for these regions can be specified by the user, but here we’ll just use the defaults.

regions <- defineRegions(
  genes = gtf$gene, transcripts = gtf$transcript, exons = gtf$exon
)
regions
## GRangesList object of length 6:
## $promoter
## GRanges object with 3359 ranges and 5 metadata columns:
##          seqnames              ranges strand |                region
##             <Rle>           <IRanges>  <Rle> |           <character>
##      [1]    chr10         61989-64988      * | Promoter (-2500/+500)
##      [2]    chr10         66360-69392      * | Promoter (-2500/+500)
##      [3]    chr10         88152-91151      * | Promoter (-2500/+500)
##      [4]    chr10         94710-97961      * | Promoter (-2500/+500)
##      [5]    chr10       119604-122603      * | Promoter (-2500/+500)
##      ...      ...                 ...    ... .                   ...
##   [3355]    chr10 135488075-135491074      * | Promoter (-2500/+500)
##   [3356]    chr10 135491384-135494383      * | Promoter (-2500/+500)
##   [3357]    chr10 135494684-135497683      * | Promoter (-2500/+500)
##   [3358]    chr10 135503135-135506134      * | Promoter (-2500/+500)
##   [3359]    chr10 135513071-135518323      * | Promoter (-2500/+500)
##                                                                  gene_id
##                                                          <CharacterList>
##      [1]                                            ENSG00000260370.2_11
##      [2]  ENSG00000260370.2_11,ENSG00000260370.2_11,ENSG00000260370.2_11
##      [3]                                             ENSG00000237297.1_7
##      [4] ENSG00000261456.6_7,ENSG00000261456.6_7,ENSG00000261456.6_7,...
##      [5]                                             ENSG00000261456.6_7
##      ...                                                             ...
##   [3355]                                             ENSG00000276046.1_5
##   [3356]                                             ENSG00000276904.1_5
##   [3357]                                             ENSG00000278641.1_5
##   [3358]                                               ENSG00000226880.1
##   [3359] ENSG00000213147.3_3,ENSG00000261914.2_3,ENSG00000282875.1_8,...
##                                                gene_name
##                                          <CharacterList>
##      [1]                                 ENSG00000260370
##      [2] ENSG00000260370,ENSG00000260370,ENSG00000260370
##      [3]                                 ENSG00000237297
##      [4]                           TUBB8,TUBB8,TUBB8,...
##      [5]                                           TUBB8
##      ...                                             ...
##   [3355]                                         DUX4L13
##   [3356]                                         DUX4L14
##   [3357]                                         DUX4L15
##   [3358]                                    XX-2136C48.7
##   [3359]         RPL23AP60,RPL23AP84,ENSG00000282875,...
##                                                            transcript_id
##                                                          <CharacterList>
##      [1]                                             ENST00000562162.2_4
##      [2]     ENST00000682078.1_1,ENST00000566940.2_4,ENST00000683798.1_1
##      [3]                                             ENST00000416477.1_2
##      [4] ENST00000568584.6_6,ENST00000568866.5_4,ENST00000561967.1_4,...
##      [5]                                             ENST00000564130.2_4
##      ...                                                             ...
##   [3355]                                             ENST00000554103.2_2
##   [3356]                                             ENST00000621227.1_2
##   [3357]                                             ENST00000611059.1_2
##   [3358]                                               ENST00000411632.1
##   [3359] ENST00000450011.1_2,ENST00000571193.2_2,ENST00000635574.1_4,...
##                                          transcript_name
##                                          <CharacterList>
##      [1]                                 ENST00000562162
##      [2] ENST00000682078,ENST00000566940,ENST00000683798
##      [3]                                 ENST00000416477
##      [4]               TUBB8-206,TUBB8-207,TUBB8-201,...
##      [5]                                       TUBB8-204
##      ...                                             ...
##   [3355]                                     DUX4L13-201
##   [3356]                                     DUX4L14-201
##   [3357]                                     DUX4L15-201
##   [3358]                                XX-2136C48.7-001
##   [3359] RPL23AP60-201,RPL23AP84-201,ENST00000635574,...
##   -------
##   seqinfo: 1 sequence from hg19 genome
## 
## ...
## <5 more elements>

Now we can assign these to the peaks using bestOverlap

union_peaks$region <- bestOverlap(union_peaks, regions)
union_peaks
## GRanges object with 188 ranges and 5 metadata columns:
##         seqnames            ranges strand |    centre        E2     E2DHT
##            <Rle>         <IRanges>  <Rle> | <numeric> <logical> <logical>
##     [1]    chr10 43048195-43048529      * |  43048372      TRUE      TRUE
##     [2]    chr10 43521739-43522260      * |  43522014      TRUE      TRUE
##     [3]    chr10 43540042-43540457      * |  43540310      TRUE      TRUE
##     [4]    chr10 43606184-43606573      * |  43606417      TRUE      TRUE
##     [5]    chr10 43851214-43851989      * |  43851719      TRUE     FALSE
##     ...      ...               ...    ... .       ...       ...       ...
##   [184]    chr10 99168349-99168690      * |  99168489      TRUE      TRUE
##   [185]    chr10 99207862-99208157      * |  99208004      TRUE      TRUE
##   [186]    chr10 99331323-99331741      * |  99331592      TRUE      TRUE
##   [187]    chr10 99340107-99340354      * |  99340248     FALSE      TRUE
##   [188]    chr10 99621632-99621988      * |  99621798      TRUE      TRUE
##                 n              region
##         <numeric>         <character>
##     [1]         2            promoter
##     [2]         2   distal_intergenic
##     [3]         2   distal_intergenic
##     [4]         2              intron
##     [5]         1 proximal_intergenic
##     ...       ...                 ...
##   [184]         2              intron
##   [185]         2            promoter
##   [186]         2            promoter
##   [187]         1   upstream_promoter
##   [188]         2   upstream_promoter
##   -------
##   seqinfo: 1 sequence from hg19 genome

When providing a GRangesList the overlaps are defined by the element names. Alternatively, we could pass a GRanges object and specify the name from the mcols element we wish to us.

union_peaks$region <- bestOverlap(union_peaks, unlist(regions), var = "region")
union_peaks
## GRanges object with 188 ranges and 5 metadata columns:
##         seqnames            ranges strand |    centre        E2     E2DHT
##            <Rle>         <IRanges>  <Rle> | <numeric> <logical> <logical>
##     [1]    chr10 43048195-43048529      * |  43048372      TRUE      TRUE
##     [2]    chr10 43521739-43522260      * |  43522014      TRUE      TRUE
##     [3]    chr10 43540042-43540457      * |  43540310      TRUE      TRUE
##     [4]    chr10 43606184-43606573      * |  43606417      TRUE      TRUE
##     [5]    chr10 43851214-43851989      * |  43851719      TRUE     FALSE
##     ...      ...               ...    ... .       ...       ...       ...
##   [184]    chr10 99168349-99168690      * |  99168489      TRUE      TRUE
##   [185]    chr10 99207862-99208157      * |  99208004      TRUE      TRUE
##   [186]    chr10 99331323-99331741      * |  99331592      TRUE      TRUE
##   [187]    chr10 99340107-99340354      * |  99340248     FALSE      TRUE
##   [188]    chr10 99621632-99621988      * |  99621798      TRUE      TRUE
##                 n                 region
##         <numeric>            <character>
##     [1]         2  Promoter (-2500/+500)
##     [2]         2     Intergenic (>10kb)
##     [3]         2     Intergenic (>10kb)
##     [4]         2                 Intron
##     [5]         1     Intergenic (<10kb)
##     ...       ...                    ...
##   [184]         2                 Intron
##   [185]         2  Promoter (-2500/+500)
##   [186]         2  Promoter (-2500/+500)
##   [187]         1 Upstream Promoter (-..
##   [188]         2 Upstream Promoter (-..
##   -------
##   seqinfo: 1 sequence from hg19 genome

Let’s set these as a factor in order as contained in the original set of regions.

reg_levels <- vapply(regions, function(x) x$region[1], character(1))
union_peaks$region <- factor(union_peaks$region, levels = reg_levels)

We can check the distribution of our peaks against these regions using plotPie()

region_colours <- hcl.colors(6, "viridis", rev = TRUE)
names(region_colours) <- reg_levels
plotPie(union_peaks, fill = "region") +
  scale_fill_manual(values = region_colours)
Distribution of peaks within defined regulatory regions

Figure 4: Distribution of peaks within defined regulatory regions

The labels for plotPie() use the glue syntax, lending enormous flexibility to what can be placed here. Whist the defaults are hopefully useful, in the code below. we’ll modify these to wrap the text using str_wrap() and add n = to the peak numbers.

plotPie(
  union_peaks, fill = "region", total_size = 5,
  cat_glue = "{str_wrap(.data[[fill]], 15)}\nn = {comma(n, 1)}\n({percent(p, 0.1)})"
) +
  scale_fill_manual(values = region_colours)
Distribution of peaks with manually tweaked segment labels

Figure 5: Distribution of peaks with manually tweaked segment labels

Finally, we might wish to know which genes are likely targets for each peak. mapByFeature() uses a mapping strategy which maps peaks/ranges to genes differently depending on which type of feature they overlap.

  1. Ranges overlapping a promoter are only assigned to the gene(s) associated with that promoter
  2. Ranges overlapping an enhancer are mapped to all gene within a given distance
  3. Ranges overlapping a long-range interaction are mapped to genes at both ends of the interaction
  4. Remaining ranges are mapped to the nearest gene within a given distance
union_peaks <- mapByFeature(
  union_peaks, genes = gtf$gene, prom = regions$promoter
)
union_peaks %>% filter(str_detect(region, "^Promoter"), !E2)
## GRanges object with 7 ranges and 7 metadata columns:
##       seqnames            ranges strand |    centre        E2     E2DHT
##          <Rle>         <IRanges>  <Rle> | <numeric> <logical> <logical>
##   [1]    chr10 54169026-54169221      * |  54169133     FALSE      TRUE
##   [2]    chr10 63808711-63809004      * |  63808893     FALSE      TRUE
##   [3]    chr10 69847132-69847384      * |  69847254     FALSE      TRUE
##   [4]    chr10 71879941-71880280      * |  71880136     FALSE      TRUE
##   [5]    chr10 74114141-74114500      * |  74114282     FALSE      TRUE
##   [6]    chr10 75579311-75579548      * |  75579425     FALSE      TRUE
##   [7]    chr10 88281479-88281723      * |  88281596     FALSE      TRUE
##               n                region
##       <numeric>              <factor>
##   [1]         1 Promoter (-2500/+500)
##   [2]         1 Promoter (-2500/+500)
##   [3]         1 Promoter (-2500/+500)
##   [4]         1 Promoter (-2500/+500)
##   [5]         1 Promoter (-2500/+500)
##   [6]         1 Promoter (-2500/+500)
##   [7]         1 Promoter (-2500/+500)
##                                          gene_id       gene_name
##                                  <CharacterList> <CharacterList>
##   [1]                        ENSG00000227972.1_6        THAP12P3
##   [2]                      ENSG00000150347.17_12          ARID5B
##   [3]                       ENSG00000138347.17_9            MYPN
##   [4]                       ENSG00000042286.15_8           AIFM2
##   [5]                      ENSG00000148719.16_11         DNAJB12
##   [6]                      ENSG00000148660.22_13          CAMK2G
##   [7] ENSG00000062650.20_11,ENSG00000227896.2_11    WAPL,WAPL-DT
##   -------
##   seqinfo: 1 sequence from hg19 genome

4 Differential Signal Detection

4.1 Using Fixed-Width Windows

4.1.1 Reading In Counts

Now that we’ve defined our set of peaks present under either condition, we can use these to perform a differential signal analysis. Existing packages such as DiffBind use fixed-width regions to minimise bias from variable sized regions. Using extraChIPs we can define these ranges using our peak centres that we’ve carried forward.

centred_peaks <- union_peaks %>% 
  mutate(
    centre = paste0(seqnames, ":", centre),
    union_peak = granges(.)
  ) %>% 
  colToRanges(var = "centre", seqinfo = sq) %>% 
  resize(width = 400, fix = "center") 
centred_peaks
## GRanges object with 188 ranges and 7 metadata columns:
##         seqnames            ranges strand |        E2     E2DHT         n
##            <Rle>         <IRanges>  <Rle> | <logical> <logical> <numeric>
##     [1]    chr10 43048172-43048571      * |      TRUE      TRUE         2
##     [2]    chr10 43521814-43522213      * |      TRUE      TRUE         2
##     [3]    chr10 43540110-43540509      * |      TRUE      TRUE         2
##     [4]    chr10 43606217-43606616      * |      TRUE      TRUE         2
##     [5]    chr10 43851519-43851918      * |      TRUE     FALSE         1
##     ...      ...               ...    ... .       ...       ...       ...
##   [184]    chr10 99168289-99168688      * |      TRUE      TRUE         2
##   [185]    chr10 99207804-99208203      * |      TRUE      TRUE         2
##   [186]    chr10 99331392-99331791      * |      TRUE      TRUE         2
##   [187]    chr10 99340048-99340447      * |     FALSE      TRUE         1
##   [188]    chr10 99621598-99621997      * |      TRUE      TRUE         2
##                           region                                    gene_id
##                         <factor>                            <CharacterList>
##     [1]    Promoter (-2500/+500)   ENSG00000234420.9_10,ENSG00000270762.1_3
##     [2]    Intergenic (>10kb)                             ENSG00000263795.1
##     [3]    Intergenic (>10kb)                         ENSG00000165731.21_15
##     [4]    Intron                                     ENSG00000165731.21_15
##     [5]    Intergenic (<10kb)                           ENSG00000285712.1_7
##     ...                      ...                                        ...
##   [184] Intron                   ENSG00000052749.14_14,ENSG00000231970.1_10
##   [185] Promoter (-2500/+500)    ENSG00000171311.13_12,ENSG00000171307.19_9
##   [186] Promoter (-2500/+500)                         ENSG00000165887.12_14
##   [187] Upstream Promoter (-5kb)                      ENSG00000165887.12_14
##   [188] Upstream Promoter (-5kb)                       ENSG00000155265.11_7
##                     gene_name              union_peak
##               <CharacterList>               <GRanges>
##     [1]       ZNF37BP,FXYD6P1 chr10:43048195-43048529
##     [2]               MIR5100 chr10:43521739-43522260
##     [3]                   RET chr10:43540042-43540457
##     [4]                   RET chr10:43606184-43606573
##     [5]       ENSG00000285712 chr10:43851214-43851989
##     ...                   ...                     ...
##   [184] RRP12,ENSG00000231970 chr10:99168349-99168690
##   [185]        EXOSC1,ZDHHC16 chr10:99207862-99208157
##   [186]                ANKRD2 chr10:99331323-99331741
##   [187]                ANKRD2 chr10:99340107-99340354
##   [188]               GOLGA7B chr10:99621632-99621988
##   -------
##   seqinfo: 1 sequence from hg19 genome

In the above, we’ve 1) created a new range from the centre-points, 2) placed the original range as a new column in the mcols element, 3) moved the centred range to being the primary range using colToRanges(). From there, we’ve resized to a standard width of 400bp. We can now use these ranges to count alignments across our set of samples. The bam files should be in the data directory and can be formed into a BamFileList, before being passed to regionCounts() from csaw. This will return a RangedSummarizedExperiment with totals from the entire bam file in the totals column of the colData element

er_bfl <- file.path("data", "ER", paste0(samples$ER$accession, ".bam")) %>% 
  BamFileList() %>% 
  setNames(samples$ER$accession)
er_se <- regionCounts(er_bfl, centred_peaks, ext = 200)
seqinfo(er_se) <- sq
er_se
## class: RangedSummarizedExperiment 
## dim: 188 6 
## metadata(2): final.ext param
## assays(1): counts
## rownames: NULL
## rowData names(7): E2 E2DHT ... gene_name union_peak
## colnames(6): SRR8315180 SRR8315181 ... SRR8315184 SRR8315185
## colData names(4): bam.files totals ext rlen
colData(er_se)
## DataFrame with 6 rows and 4 columns
##                         bam.files    totals       ext      rlen
##                       <character> <integer> <integer> <integer>
## SRR8315180 data/ER/SRR8315180.bam    317845       200        75
## SRR8315181 data/ER/SRR8315181.bam    337623       200        75
## SRR8315182 data/ER/SRR8315182.bam    341998       200        75
## SRR8315183 data/ER/SRR8315183.bam    315872       200        75
## SRR8315184 data/ER/SRR8315184.bam    352908       200        75
## SRR8315185 data/ER/SRR8315185.bam    347709       200        75

Let’s ensure our sample metadata is consistent between this object and samples$ER

colData(er_se) <- colData(er_se) %>% 
  as_tibble(rownames = "accession") %>% 
  left_join(samples$ER) %>% 
  as("DataFrame")

The use of fixed-width and centred ranges minimises bias which might be between samples with different signal characteristics, but importantly, the counts from these centred ranges as simply our proxy for the ranges defined earlier. Again, using colToRanges() we can place our original set of union peaks back as the GRanges element underlying the RangedSummarizedExperiment.

rowRanges(er_se) <- rowRanges(er_se) %>% 
  colToRanges(var = "union_peak") %>% 
  select(region, starts_with("gene"))

4.1.2 Visualising Data

Now that we’ve loaded counts, we can explore and visualise them in several ways. The first might be to check the count distributions, using plotAssayDensities().

plotAssayDensities(er_se, trans = "log1p", colour = "treatment") +
  scale_colour_manual(values = treat_colours)
Densities of raw counts for all centred regions, using the 'log1p' transformation

Figure 6: Densities of raw counts for all centred regions, using the ‘log1p’ transformation

We might follow this with a PCA

plotAssayPCA(
  er_se, trans = "log1p", colour = "treatment", label = "accession"
) +
  scale_colour_manual(values = treat_colours)
PCA plot for all ER samples

Figure 7: PCA plot for all ER samples

An important QC metric as an RLE plot which finds the median value across all samples & ranges to identify any samples with consistently higher/lower or divergent signal. By way of example, we may wish to normalise out counts using logCPM values for this purpose, so let’s perform this step first

assay(er_se, "logCPM") <- cpm(
  assay(er_se, "counts"), lib.size = er_se$totals, log = TRUE
)
plotAssayRle(er_se, assay = "logCPM", fill = "treatment") +
  geom_hline(yintercept = 0, linetype = 2) +
  scale_fill_manual(values = treat_colours) 
RLE plot for all ER samples

Figure 8: RLE plot for all ER samples

A problem sometimes faced in ChIP-Seq data which is less prevalent in RNA-Seq data that counts may be drawn from different distributions under different treatments, as DNA-binding is either increased or decreased globally by a treatment. We can simply view all samples summarised by treatment group in order to highlight any possible global differences.

plotAssayRle(er_se, assay = "logCPM", fill = "treatment", by_x = "treatment") +
  geom_hline(yintercept = 0, linetype = 2) +
  scale_fill_manual(values = treat_colours) 
RLE plot for ER samples grouped by treatment.

Figure 9: RLE plot for ER samples grouped by treatment

4.1.3 Differential Signal Analysis

Performing Differential Signal Analysis is now simple using fitAssayDiff(), which offers two methods of analysis: method = "qlf" (default) or method = "lt". Respectively, these apply Quasi-Likelihood fits using edgeR (Lund et al. 2012), as is appropriate for count data, or limma-trend (Law et al. 2014) if fitting on the log-transformed, (e.g. logCPM) data. By default, this will return a SummarizedExperiment with results added to the rowRanged()/rowData() element. Alternatively, we can return a GRanges object by setting asRanges = TRUE

X <- model.matrix(~treatment, data = colData(er_se))
er_results <- fitAssayDiff(er_se, design = X, asRanges = TRUE)

The default normalisation is to use Library Size as contained in the totals column. However, by setting lib.size = NULL then colSums() will be used if this is considered more suitable. Additionally, any of the normalisation methods available in edgeR::calcNormFactors() can be applied. Providing a column name to the groups argument will also normalise within groups, instead of across the entire dataset.

er_results <- fitAssayDiff(er_se, design = X, norm = "TMM", asRanges = TRUE)

Finally, a range-based H0 (McCarthy and Smyth 2009) can be applied by provided a fold-change value below which we consider change to be uninteresting, and we can also automatically add Differential Signal status.

er_results <- fitAssayDiff(
  er_se, design = X, norm = "TMM", asRanges = TRUE, fc = 1.2
) %>% 
  addDiffStatus()
arrange(er_results, PValue)
## GRanges object with 188 ranges and 8 metadata columns:
##         seqnames            ranges strand |                   region
##            <Rle>         <IRanges>  <Rle> |                 <factor>
##     [1]    chr10 81101906-81102928      * | Upstream Promoter (-5kb)
##     [2]    chr10 79629641-79630271      * | Promoter (-2500/+500)   
##     [3]    chr10 89407752-89408138      * | Intron                  
##     [4]    chr10 52233596-52233998      * | Exon                    
##     [5]    chr10 51547205-51547782      * | Promoter (-2500/+500)   
##     ...      ...               ...    ... .                      ...
##   [184]    chr10 93882729-93883168      * |    Intron               
##   [185]    chr10 71344398-71344749      * |    Intergenic (<10kb)   
##   [186]    chr10 76586249-76586887      * |    Promoter (-2500/+500)
##   [187]    chr10 82725993-82726916      * |    Intergenic (>10kb)   
##   [188]    chr10 64605136-64605347      * |    Intron               
##                                            gene_id
##                                    <CharacterList>
##     [1]                       ENSG00000108179.14_6
##     [2]                      ENSG00000151208.17_11
##     [3]   ENSG00000225913.2_9,ENSG00000196566.2_10
##     [4] ENSG00000198964.14_10,ENSG00000225303.2_10
##     [5]                        ENSG00000263639.7_7
##     ...                                        ...
##   [184]                      ENSG00000107864.15_12
##   [185]                        ENSG00000230755.1_6
##   [186]  ENSG00000272748.1_8,ENSG00000156650.14_17
##   [187]                        ENSG00000227209.1_5
##   [188]                        ENSG00000289487.1_1
##                               gene_name       logFC    logCPM      PValue
##                         <CharacterList>   <numeric> <numeric>   <numeric>
##     [1]                            PPIF    1.887377   8.02941 6.00421e-11
##     [2]                            DLG5    0.966791   8.23125 4.13714e-06
##     [3] ENSG00000225913,ENSG00000196566    1.712374   6.20441 1.00877e-05
##     [4]           SGMS1,ENSG00000225303    1.129786   6.70351 2.34471e-04
##     [5]                            MSMB    0.664784   7.44213 1.13493e-03
##     ...                             ...         ...       ...         ...
##   [184]                           CPEB3 -0.00513719   6.39992    0.985018
##   [185]                       MTATP6P23 -0.00519869   6.77464    0.985285
##   [186]           ENSG00000272748,KAT6B -0.00909379   8.12097    0.985618
##   [187]                         WARS2P1 -0.00371918   9.43978    0.994243
##   [188]                 ENSG00000289487  0.00109980   5.96820    0.997631
##                 FDR    status
##           <numeric>  <factor>
##     [1] 1.12879e-08 Increased
##     [2] 3.88891e-04 Increased
##     [3] 6.32163e-04 Increased
##     [4] 1.10201e-02 Increased
##     [5] 4.26732e-02 Increased
##     ...         ...       ...
##   [184]    0.996217 Unchanged
##   [185]    0.996217 Unchanged
##   [186]    0.996217 Unchanged
##   [187]    0.997631 Unchanged
##   [188]    0.997631 Unchanged
##   -------
##   seqinfo: 1 sequence from hg19 genome

Given these results are in a GRanges object, we can produce an MA-plot using conventional ggplot2 approaches.

status_colours <- c(
  Undetected = "grey70", Unchanged = "grey30", 
  Decreased = "blue", Increased = "red"
)
er_results %>% 
  as_tibble() %>% 
  ggplot(aes(logCPM, logFC)) +
  geom_point(aes(colour = status)) +
  geom_smooth(method = "lm", se = FALSE) +
  scale_colour_manual(values = status_colours)
MA-plot for ERa analysis using fixed-width windows, TMM normalisation and setting 20% as the range beyond which we consider diferential signal to be of interest.

Figure 10: MA-plot for ERa analysis using fixed-width windows, TMM normalisation and setting 20% as the range beyond which we consider diferential signal to be of interest

4.1.4 Visualising Results

Additionally, we might like to compare Differential Signal Status against the regions we defined earlier and plotSplitDonut() is well suited to this. A key advantage of plotSplitDonut() is the ability to specify inner and outer palettes separately

er_results %>% 
  plotSplitDonut(
    inner = "region", outer = "status",
    inner_palette = region_colours, outer_palette = status_colours
  )
Split-Donut plot showing distribution of peaks with diferential signal

Figure 11: Split-Donut plot showing distribution of peaks with diferential signal

Once again, labels are controlled using the glue syntax and segments are able to be exploded based on matches to a regular expression

er_results %>% 
  plotSplitDonut(
    inner = "region", outer = "status",
    inner_palette = region_colours, outer_palette = status_colours,
    inner_glue = "n = {n}", inner_min_p = 0, inner_label_size = 3.5,
    outer_glue = "{str_wrap(.data[[inner]], 15)}\n{.data[[outer]]}\nn = {n}",
    outer_min_p = 0, outer_max_p = 0.03,
    explode_outer = "Increased|Decreased", explode_r = 0.3
  )
Split-Donut plot exploding segments with evidence of differential signal, and modifying segment labels

Figure 12: Split-Donut plot exploding segments with evidence of differential signal, and modifying segment labels

4.2 Using Sliding Windows

fitAssayDiff() is agnostic to sliding or fixed-width windows. Whilst fixed-width windows are relatively simple conceptually, sliding windows offer other advantages such as being able to manage data where signal comes from genomic regions which are highly variable in width, such as histone marks like H3K27ac. Sliding window analysis is well-implemented in csaw, however extraChIPs offers the additional ability to combine ranges using asymptotically correct harmonic mean p-value (Wilson 2019). The step of moving from genomic windows to a subset of windows which we can confidently assume contain signal is also able to be managed by extraChIPs using dualFilter().

For this section of the analysis, we’ll use data for the histone mark H3K27ac from the same cell type under the same two conditions (E2 + E2DHT). A new sample metadata sheet can be formed using the following, and we’ll immediately for the BamFileList for reading in counts.

acc <- c(samples$H3K27ac$accession, samples$H3K27ac$input) %>% 
  unique()
h3k_bfl <- file.path("data", "H3K27ac", paste0(acc, ".bam")) %>% 
  BamFileList() %>% 
  setNames(acc)

4.2.1 Defining Windows

Given we’re going to use sliding windows, we can use a smaller window size and need to set a step-size for ‘sliding’ along the genome. Whilst there are no hard and fast rules, for a histone marks, window sizes of 150bp sliding in steps of 50bp may be a good choice.

win_size <- 150
win_step <- 50

The example dataset for today can be easily managed on a laptop, however, it should be noted that for a full-scale dataset, large amounts of RAM are required, and this process may take considerable time. In the following we define some key parameters for passing to csaw::windowCounts(), where we restrict counting to chr10, provide out black and grey-lists and let the function know that these are all single-end reads.

From there, we perform the counts using the window step & size as defined above, extending each read to represent a full DNA fragment of 200nt, and discarding any regions with fewer than 1 read/sample.

rp <- readParam(pe = "none", restrict = "chr10", discard = bg_list)
wincounts <- windowCounts(
  bam.files = h3k_bfl,
  spacing = win_step, width = win_size, ext = 200,
  filter = length(h3k_bfl),
  param = rp
)
seqlevels(wincounts) <- seqlevels(sq)
seqinfo(wincounts) <- sq
dim(wincounts)
## [1] 467904      7

Many of these windows will have counted reads which are essentially background noise, and we wish to exclude these and only retain those which confidently contain signal from our ChIP target. extraChIPs provides a function dualFilter() in which a set of guide ranges can be provided where signal is expected. Thresholds for minimum overall signal, and signal relative to input are set based on retaining a give proportion of reads which overlap these ranges, with lower numbers returning more stringently selected ranges.

A set of guide ranges has been prepared following a similar process to forming union peaks above for ER\(\alpha\), and can be loaded then passed to dualFilter(). Again, this dataset is easily handled on a laptop, whilst for a complete genome, considerable amounts of RAM will be required.

guide_ranges <- file.path("data", "H3K27ac", "H3K27ac_chr10.bed") %>% 
  import.bed(seqinfo = sq)
filtcounts <- dualFilter(
  x = wincounts[, samples$H3K27ac$accession],
  bg  =  wincounts[, unique(samples$H3K27ac$input)],
  ref = guide_ranges, q = 0.6
)
colData(filtcounts) <- colData(filtcounts) %>% 
  as_tibble(rownames = "accession") %>% 
  left_join(samples$H3K27ac) %>% 
  as("DataFrame")
dim(filtcounts)
## [1] 19017     6

By default, an additional assay logCPM will be added during this step and this can also be used for QC.

A <- plotAssayDensities(filtcounts, assay = "logCPM", colour = "treatment") + 
  scale_colour_manual(values = treat_colours)
B <- plotAssayPCA(
  filtcounts, assay = "logCPM", colour = "treatment", label = "accession"
) +
  scale_colour_manual(values = treat_colours)
A + B + plot_layout(guides = "collect") + plot_annotation(tag_levels = "A") 
A) Density plot and B) PCA based on logCPM values from the sliding windows retained after filtering.

Figure 13: A) Density plot and B) PCA based on logCPM values from the sliding windows retained after filtering

Now we can pass our filtered windows to fitAssayDiff() as we did previously, however, once we have performed statistical testing on each window, we need to merge windows to obtain our final results.

X <- model.matrix(~treatment, data = colData(filtcounts))
h3k_all <- fitAssayDiff(filtcounts, norm = "TMM", design = X, asRanges = TRUE)

Although we haven’t declared in regions as showing differential signal, we can still inspect an MA plot for any unexpected issues in our data

h3k_all %>% 
  as_tibble() %>% 
  ggplot(aes(logCPM, logFC)) +
  geom_point() +
  geom_smooth(se = FALSE)
MA plot using all sliding windows retained after the initial filtering step.

Figure 14: MA plot using all sliding windows retained after the initial filtering step

As expected with H3K27ac signal, the total amount of signal obtained under two related conditions is similar and we have a nicely centred plot.

4.2.2 Merging Windows

Now we can merge overlapping windows to obtain our final results. Using extraChIPs this can be performed using all methods offered by csaw, (?mergeByCol and ?mergeBySig) as well as using the asymptotically-exact harmonic mean p-value, which we’ll use here.

h3k_results <- mergeByHMP(h3k_all, min_win = 2, merge_within = win_size + 1) %>% 
  addDiffStatus()
h3k_results
## GRanges object with 643 ranges and 9 metadata columns:
##         seqnames            ranges strand | n_windows      n_up    n_down
##            <Rle>         <IRanges>  <Rle> | <integer> <integer> <integer>
##     [1]    chr10 42862951-42863350      * |         6         5         0
##     [2]    chr10 43047801-43048200      * |         6         0         3
##     [3]    chr10 43048401-43048800      * |         6         0         3
##     [4]    chr10 43132901-43134800      * |        36        13        16
##     [5]    chr10 43135701-43137250      * |        29         7         3
##     ...      ...               ...    ... .       ...       ...       ...
##   [639]    chr10 99473501-99474250      * |        13         0         5
##   [640]    chr10 99496401-99497850      * |        27         9        18
##   [641]    chr10 99608301-99610050      * |        32         0         8
##   [642]    chr10 99628751-99629400      * |        11         2         6
##   [643]    chr10 99894101-99895400      * |        24         4        13
##                    keyval_range    logCPM      logFC       hmp   hmp_fdr
##                       <GRanges> <numeric>  <numeric> <numeric> <numeric>
##     [1] chr10:42863051-42863200   6.31662  0.3345653 0.4041174  0.997968
##     [2] chr10:43048001-43048150   6.16996 -0.0780895 0.8764236  0.997968
##     [3] chr10:43048601-43048750   6.31199 -0.2619553 0.6083939  0.997968
##     [4] chr10:43134251-43134400   7.67436  0.1350775 0.5145484  0.997968
##     [5] chr10:43137101-43137250   6.87832  0.0791231 0.0119809  0.126290
##     ...                     ...       ...        ...       ...       ...
##   [639] chr10:99473851-99474000   6.43974 -0.4427309 0.2576771  0.815392
##   [640] chr10:99497701-99497850   7.41777  0.0551099 0.9923554  0.997968
##   [641] chr10:99608601-99608750   7.10450 -0.5008959 0.0309346  0.236797
##   [642] chr10:99628851-99629000   6.63987 -0.2298606 0.6588732  0.997968
##   [643] chr10:99895051-99895200   8.08365 -0.1143540 0.6362419  0.997968
##            status
##          <factor>
##     [1] Unchanged
##     [2] Unchanged
##     [3] Unchanged
##     [4] Unchanged
##     [5] Unchanged
##     ...       ...
##   [639] Unchanged
##   [640] Unchanged
##   [641] Unchanged
##   [642] Unchanged
##   [643] Unchanged
##   -------
##   seqinfo: 1 sequence from hg19 genome

Notice that we have additional columns not returned when using fixed-width windows. The first three indicate how many original windows were merged, along with how many may have been individually considered as showing differential signal in either direction. Next to these is the column keyval_range which indicates the original window which returned the lowest p-value, or by setting keyval = "merged", will return the range containing all original windows with raw p-values below the harmonic-mean p-value.

Given the harmonic mean involves summing the inverse of raw p-values, these are considered to be weights and the returned values for logCPM and logFC are also weighted averages, using \(w_i = \frac{1}{p_i}\) as weights for each windows \(i\). Given this, any MA-plots may appear to show bias, which is in fact due to the weighting strategy used here. As such the previous figure using all windows prior to merging is the best strategy for identifying any issues with the data

h3k_results %>% 
  as_tibble() %>% 
  ggplot(aes(logCPM, logFC)) +
  geom_point(aes(colour = status), alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE) +
  scale_colour_manual(values = status_colours)
MA plot showing results after merging overlapping windows.

Figure 15: MA plot showing results after merging overlapping windows

5 Comparing Both Sets Of Results

extraChIPs provide additional capacity for comparing between two or more sets of results. The key structure for this is a GRangesList, so let’s ensure we have compatible column names, then form a combined object.

h3k_results <- h3k_results %>% 
  select(logCPM, logFC, PValue = hmp, FDR = hmp_fdr, status) %>% 
  mutate(region = bestOverlap(., unlist(regions), var = "region")) %>% 
  mapByFeature(genes = gtf$gene, prom = regions$promoter)
both_results <- GRangesList(ER = er_results, H3K27ac = h3k_results) 

5.1 Association Between Datasets

Now we have a combined GRangesList object, we can plot columns from each element against the other. If we wish to compare signal in sites where both are detected, we can use plotPairwise(), which will also show boxplots of signal for where only one target is detected, alongside those where both targets are detected

plotPairwise(both_results, var = "logCPM")
Comparison of logCPM values when both targets are detected, with boxplots showing the range of values when each target was detected alone, or when both were detected together

Figure 16: Comparison of logCPM values when both targets are detected, with boxplots showing the range of values when each target was detected alone, or when both were detected together

Although we only have a handful of sites in our example dataset, one might easily suspect that signal for both targets is higher when both are detected together.

Alternatively, we might wish to see which sites are changing together, or where only one is changing. This time we’ll plot logFC, colouring points by the combined Differential Signal status in each element, and adding labels from the gene_name column. Labels are only added to the sites in each combined group which are furthest from zero.

plotPairwise(
  both_results, var = "logFC", colour = "status", label = "gene_name"
) +
  scale_colour_manual(
    values = c("lightskyblue", "palevioletred", "grey", "darkred")
  ) +
  scale_fill_manual(
    values = c(
      setNames(status_colours, paste("ER", names(status_colours))),
      setNames(status_colours, paste("H3K27ac", names(status_colours)))
    )
  )
Comparison of logFC values when both targets are detcted, with boxplots showing the range of values when only one target is detected, or when both are detected

Figure 17: Comparison of logFC values when both targets are detcted, with boxplots showing the range of values when only one target is detected, or when both are detected

We can also compare columns by performing an operation somewhat analogous to pivot_wider where we find overlapping ranges and add columns from each element to a set of consensus peaks.

mapGrlCols(both_results, var = "status") 
## GRanges object with 741 ranges and 2 metadata columns:
##         seqnames            ranges strand | ER_status H3K27ac_status
##            <Rle>         <IRanges>  <Rle> |  <factor>       <factor>
##     [1]    chr10 42862951-42863350      * | NA             Unchanged
##     [2]    chr10 43047801-43048800      * | Unchanged      Unchanged
##     [3]    chr10 43047801-43048800      * | Unchanged      Unchanged
##     [4]    chr10 43132901-43134800      * | NA             Unchanged
##     [5]    chr10 43135701-43137250      * | NA             Unchanged
##     ...      ...               ...    ... .       ...            ...
##   [737]    chr10 99496401-99497850      * | NA             Unchanged
##   [738]    chr10 99608301-99610050      * | NA             Unchanged
##   [739]    chr10 99621632-99621988      * | Unchanged      NA       
##   [740]    chr10 99628751-99629400      * | NA             Unchanged
##   [741]    chr10 99894101-99895400      * | NA             Unchanged
##   -------
##   seqinfo: 1 sequence from hg19 genome

We can use this to quickly find sites which are of interest in both sets of results, along with their targets

both_results %>% 
  mapGrlCols(var = c("region", "status", "FDR"), collapse = "gene_name") %>% 
  filter(
    str_detect(ER_status, "Increased|Decreased"),
    str_detect(H3K27ac_status, "Increased|Decreased")
  )
## GRanges object with 3 ranges and 7 metadata columns:
##       seqnames            ranges strand |                ER_region
##          <Rle>         <IRanges>  <Rle> |                 <factor>
##   [1]    chr10 71879101-71881150      * | Promoter (-2500/+500)   
##   [2]    chr10 79623301-79635950      * | Promoter (-2500/+500)   
##   [3]    chr10 81101651-81103150      * | Upstream Promoter (-5kb)
##                 H3K27ac_region ER_status H3K27ac_status      ER_FDR H3K27ac_FDR
##                       <factor>  <factor>       <factor>   <numeric>   <numeric>
##   [1] Promoter (-2500/+500)    Increased      Increased 4.46675e-02 8.68739e-03
##   [2] Promoter (-2500/+500)    Increased      Increased 3.88891e-04 7.76644e-06
##   [3] Upstream Promoter (-5kb) Increased      Increased 1.12879e-08 1.60656e-05
##                   gene_name
##                 <character>
##   [1]                 AIFM2
##   [2] DLG5; ENSG00000204049
##   [3]                  PPIF
##   -------
##   seqinfo: 1 sequence from hg19 genome

This also makes comparison of sites easy using plotSplitDonut() and in the following we’ll only show sites where both are detected.

both_results %>% 
  mapGrlCols(var = "status") %>% 
  as_tibble() %>% 
  dplyr::filter(!if_any(ends_with("status"), is.na)) %>% 
  dplyr::rename_with(\(x) str_remove_all(x, "_status")) %>% 
  plotSplitDonut(
    inner = "H3K27ac", outer = "ER",
    inner_palette = status_colours, outer_palette = status_colours,
    inner_glue = "{inner}\n{.data[[inner]]}\n{n}",
    outer_glue = "{outer}\n{.data[[outer]]}\n{n}",
    label_alpha = 0.8, min_p = 0.02,
    explode_inner = "Increased", explode_outer = "Increased", 
    explode_query = "AND", explode_r = 0.3
  )
Combined status when both targets are detected together

Figure 18: Combined status when both targets are detected together

Perhaps we might wish to look at the combined status in reference to the genomic regions we defined previously. Here, we’ll scale the site by width to show results in kb, instead of counting sites.

both_results %>% 
  mapGrlCols(var = c("region", "status")) %>% 
  as_tibble(rangeAsChar = FALSE) %>% 
  dplyr::filter(!if_any(ends_with("status"), is.na)) %>% 
  mutate(
    ER_status = fct_relabel(ER_status, \(x) paste("ER", x)),
    H3K27ac_status = fct_relabel(H3K27ac_status, \(x) paste("H3K27ac", x)),
    combined = fct_cross(ER_status, H3K27ac_status, sep = "\n")
  ) %>% 
  plotSplitDonut(
    scale_by = "width",
    inner = "H3K27ac_region", outer = "combined",
    inner_palette = region_colours,
    total_glue = "{comma(N)}kb",
    inner_glue = "{str_wrap(.data[[inner]], 15)}\n{comma(n, 1)}kb",
    outer_glue = "{.data[[outer]]}\n{comma(n, 0.1)}kb",
    outer_min_p = 0.01, #outer_max_p = 0.02,
    explode_outer = "(Inc|Dec).+(Inc|Dec)", explode_r = 0.4,
    label_alpha = 0.7
  )
Combined status shown by the region, as defined by the best overlap for H3K27ac signal. Regions here are scaled by width and totals shown in kb

Figure 19: Combined status shown by the region, as defined by the best overlap for H3K27ac signal
Regions here are scaled by width and totals shown in kb

5.2 Regions in Context

As well the above strategies for inspecting results, looking directly at the sites using a genome browser is a common strategy. extraChIPs provides the function plotHFGC() which enables us to take a genome level viewpoint of any 1) HiC Interactions, 2) Features, 3) Gene Models and 4) Coverage. All tracks are optional and will be displayed in this order when provided using the various track types provided by Gviz (Hahne and Ivanek 2016). In order to use this approach to scan through our results choosing one ranges at a time, let’s define the elements that we have starting with our genomic regions, which we would consider to be a set of genomic features. For this track, we’ll need a GRangesList() and each element will be filled a different colour, meaning we’ll need a matching vector of colours. We already have both of these from earlier on our workflow, but we will need to change the names on our colour vector

names(region_colours) <- names(regions)

Next we’ll setup our gene models in the format that Gviz is familiar with, using the exons element from our GTF annotations.

gene_models <- gtf$exon %>% 
  select(
    type, gene = gene_id, exon = exon_id, transcript = transcript_id, 
    symbol = gene_name
  )

This leaves coverage as our final track, or set of tracks. plotHFGC() takes BigWigFileList objects for coverage and if passing a single BigwigFileList, each element will be drawn on a separate track. For our results we have both ER and H3K27ac so a more informative plot might be to overlay the two treatments in a single track for each target. To facilitate this, we need a named list of BigwigFileList objects and a matching named list of colours. In each of our data directories we have summarised coverage for E2 and E2DHT

targets <- c("ER", "H3K27ac")
bwfl <- list(
  ER = file.path("data", "ER", glue("{treat_levels}_cov_chr10.bw")),
  H3K27ac = file.path("data", "H3K27ac", glue("{treat_levels}_cov_chr10.bw"))
) %>% 
  lapply(setNames, treat_levels) %>% 
  lapply(BigWigFileList)
line_colours <- lapply(bwfl, function(x) treat_colours[names(x)])

Now we have our track setup, we’ll load the cytogenetic bands provided with extraChIPs to ensure we have a band at the top of the plot, although this too is optional.

data("grch37.cytobands")

And finally, let’s select our first range as the most highly ranked site for change ER binding, which also appears to show an increase in H3K27ac signal

gr <- both_results %>% 
  mapGrlCols(
    var = c("region", "status", "FDR"), collapse = "gene_name"
  ) %>% 
  arrange(ER_FDR) %>% 
  .[1]

Now we pass all of these to plotHFGC() zooming out a little to try and get some kind of wider context.

plotHFGC(
  gr, cytobands = grch37.cytobands, 
  features = regions, featcol = region_colours,
  genes = gene_models, genecol = "wheat", 
  coverage =  bwfl, linecol = line_colours,
  zoom = 10
)
Default plot from plotHFGC showing both ER and H3K27ac signal on separate tracks, with the two treatment groups overlaid as separately coloured lines.

Figure 20: Default plot from plotHFGC showing both ER and H3K27ac signal on separate tracks, with the two treatment groups overlaid as separately coloured lines

One helpful feature of is the ability to annotate our coverage tracks to indicate statistical significance. If we simply split our initial set of results into GRangesList objects based on their status, we can use these to add coloured bars above our coverage tracks to denote that signal was detected there, and what the results from analysis were. We can simply pass our vector of status colours that we’ve already defined for these

cov_annot <- list(
  ER = splitAsList(er_results, er_results$status),
  H3K27ac = splitAsList(h3k_results, h3k_results$status)
)

Now let’s add these annotations and tweak the appearance of the plot

plotHFGC(
  gr, cytobands = grch37.cytobands, 
  features = regions, featcol = region_colours, featstack = "dense",
  genes = gene_models, genecol = "wheat",
  coverage =  bwfl, linecol = line_colours,
  annotation = cov_annot, annotcol = status_colours,
  zoom = 10, featsize = 2, title.width = 0.9, 
  cex.axis = 1, cex.title = 1, rotation.title = 90,  fontsize = 12
)
Customised output from plotHFGC, adding annotations for each site as well as tweaking the labels and font sizes

Figure 21: Customised output from plotHFGC, adding annotations for each site as well as tweaking the labels and font sizes

Just like we were able to provide a separate BigWigFileList to indicate separate tracks for our targets, we can use a similar strategy for the genes and features tracks.

  1. A GRangesList of gene will draw each element on a separate track. This may be useful for indicating results from a Differential Gene Expression analysis by plotting up-regulated and down-regulated genes on separate tracks in different colours
  2. Passing a named list of GRangesList objects to the features track will add each element of the list as a separate tracks. In this way you may wish to add multiple features, such as a chromHMM track along with CTCF or other known binding sites.

5.3 Profile Heatmaps

A final way of visualising results may be to draw a Profile Heatmap, where we show coverage or fold-enrichment over background in the surrounding regions. Instead of separating our BigWigFileList into separate tracks, let’s just use a single BigWigFileList with coverage for both ER and H3K27ac.

profile_bwfl <- bwfl %>% 
  lapply(path) %>% 
  unlist() %>% 
  setNames(str_replace_all(names(.), "\\.", " ")) %>% 
  BigWigFileList()

Given we only have a small dataset, let’s just choose the top 20 sites by logFC in the ER dataset, and draw Profile Heatmaps for both targets. We first make a call to getProfileData() which smooths the data into bins around the site

er_sig <- arrange(er_results, -logFC)[1:20]
pd <- getProfileData(profile_bwfl, er_sig)

We can now pass this to plotProfileHeatmap() and can use any of the categorical columns in the original set of ranges to facet our results. Here we’ll facet by status in the ER dataset, which will also draw lines in separate colours.

plotProfileHeatmap(pd, "profile_data", facetY = "status") +
  scale_colour_manual(values = status_colours) +
  scale_fill_gradient(low = "white", high = "red") +
  labs(fill = "logCPM")
Profile Heatmap for the 20 most highly ranked sites for increased ER binding. Ranges are centred around the ER binding patterns. The double peak seen earlier in the output from plotHFGC is at the top of the first panel.`

Figure 22: Profile Heatmap for the 20 most highly ranked sites for increased ER binding
Ranges are centred around the ER binding patterns. The double peak seen earlier in the output from plotHFGC is at the top of the first panel.`

A complete dataset would provide a far cleaner set of results, but we can still see the general increases in ER signal and how these relate to H3K27ac signal.

References

Chen, Yunshun, Aaron A T Lun, and Gordon K Smyth. 2016. “From Reads to Genes to Pathways: Differential Expression Analysis of RNA-Seq Experiments Using Rsubread and the edgeR Quasi-Likelihood Pipeline.” F1000Research 5: 1438. https://doi.org/10.12688/f1000research.8987.2.
Hahne, Florian, and Robert Ivanek. 2016. “Statistical Genomics: Methods and Protocols.” In, edited by Ewy Mathé and Sean Davis, 335–51. New York, NY: Springer New York. https://doi.org/10.1007/978-1-4939-3578-9_16.
Hickey, Theresa E, Luke A Selth, Kee Ming Chia, Geraldine Laven-Law, Heloisa H Milioli, Daniel Roden, Shalini Jindal, et al. 2021. “The Androgen Receptor Is a Tumor Suppressor in Estrogen Receptor-Positive Breast Cancer.” Nat. Med. 27 (2): 310–20.
Krassowski, Michał. 2020. “ComplexUpset.” https://doi.org/10.5281/zenodo.3700590.
Law, Charity W, Yunshun Chen, Wei Shi, and Gordon K Smyth. 2014. “Voom: Precision Weights Unlock Linear Model Analysis Tools for RNA-seq Read Counts.” Genome Biol. 15 (2): R29.
Lun, Aaron T L, and Gordon K Smyth. 2016. “Csaw: A Bioconductor Package for Differential Binding Analysis of ChIP-Seq Data Using Sliding Windows.” Nucleic Acids Res. 44 (5): e45.
Lund, Steven P, Dan Nettleton, Davis J McCarthy, and Gordon K Smyth. 2012. “Detecting Differential Expression in RNA-sequence Data Using Quasi-Likelihood with Shrunken Dispersion Estimates.” Stat. Appl. Genet. Mol. Biol. 11 (5).
McCarthy, Davis J, and Gordon K Smyth. 2009. “Testing Significance Relative to a Fold-Change Threshold Is a TREAT.” Bioinformatics 25 (6): 765–71.
Pederson, Stevie. 2023. “prepareChIPs:” https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.528.1.
Ross-Innes, Caryn S., Rory Stark, Andrew E. Teschendorff, Kelly A. Holmes, H. Raza Ali, Mark J. Dunning, Gordon D. Brown, et al. 2012. “Differential Oestrogen Receptor Binding Is Associated with Clinical Outcome in Breast Cancer.” Nature 481: –4. http://www.nature.com/nature/journal/v481/n7381/full/nature10730.html.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wilson, Daniel J. 2019. “The Harmonic Mean p-Value for Combining Dependent Tests.” Proc. Natl. Acad. Sci. U. S. A. 116 (4): 1195–1200.
Zhang, Yong, Tao Liu, Clifford A Meyer, Jérôme Eeckhoute, David S Johnson, Bradley E Bernstein, Chad Nusbaum, et al. 2008. “Model-Based Analysis of ChIP-Seq (MACS).” Genome Biol. 9 (9): R137.

Appendix

A Session info

## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.6 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
##  [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
##  [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Australia/Adelaide
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] patchwork_1.1.2             rtracklayer_1.60.0         
##  [3] edgeR_3.42.4                limma_3.56.2               
##  [5] csaw_1.34.0                 Rsamtools_2.16.0           
##  [7] Biostrings_2.68.1           XVector_0.40.0             
##  [9] extraChIPs_1.5.9            SummarizedExperiment_1.30.2
## [11] Biobase_2.60.0              MatrixGenerics_1.12.2      
## [13] matrixStats_1.0.0           ggside_0.2.2               
## [15] BiocParallel_1.34.2         plyranges_1.20.0           
## [17] GenomicRanges_1.52.0        GenomeInfoDb_1.36.1        
## [19] IRanges_2.34.1              S4Vectors_0.38.1           
## [21] BiocGenerics_0.46.0         glue_1.6.2                 
## [23] lubridate_1.9.2             forcats_1.0.0              
## [25] stringr_1.5.0               dplyr_1.1.2                
## [27] purrr_1.0.1                 readr_2.1.4                
## [29] tidyr_1.3.0                 tibble_3.2.1               
## [31] ggplot2_3.4.2               tidyverse_2.0.0            
## [33] BiocStyle_2.28.0           
## 
## loaded via a namespace (and not attached):
##   [1] splines_4.3.1              BiocIO_1.10.0             
##   [3] bitops_1.0-7               filelock_1.0.2            
##   [5] polyclip_1.10-4            XML_3.99-0.14             
##   [7] rpart_4.1.19               lifecycle_1.0.3           
##   [9] doParallel_1.0.17          lattice_0.21-8            
##  [11] ensembldb_2.24.0           MASS_7.3-60               
##  [13] backports_1.4.1            magrittr_2.0.3            
##  [15] Hmisc_5.1-0                sass_0.4.7                
##  [17] rmarkdown_2.23             jquerylib_0.1.4           
##  [19] yaml_2.3.7                 metapod_1.8.0             
##  [21] Gviz_1.44.0                DBI_1.1.3                 
##  [23] RColorBrewer_1.1-3         abind_1.4-5               
##  [25] zlibbioc_1.46.0            AnnotationFilter_1.24.0   
##  [27] biovizBase_1.48.0          RCurl_1.98-1.12           
##  [29] nnet_7.3-19                tweenr_2.0.2              
##  [31] VariantAnnotation_1.46.0   rappdirs_0.3.3            
##  [33] circlize_0.4.15            GenomeInfoDbData_1.2.10   
##  [35] ggrepel_0.9.3              codetools_0.2-19          
##  [37] DelayedArray_0.26.6        xml2_1.3.5                
##  [39] ggforce_0.4.1              tidyselect_1.2.0          
##  [41] shape_1.4.6                futile.logger_1.4.3       
##  [43] farver_2.1.1               ComplexUpset_1.3.3        
##  [45] BiocFileCache_2.8.0        base64enc_0.1-3           
##  [47] GenomicAlignments_1.36.0   jsonlite_1.8.7            
##  [49] GetoptLong_1.0.5           Formula_1.2-5             
##  [51] iterators_1.0.14           foreach_1.5.2             
##  [53] tools_4.3.1                progress_1.2.2            
##  [55] Rcpp_1.0.11                gridExtra_2.3             
##  [57] mgcv_1.8-42                xfun_0.39                 
##  [59] withr_2.5.0                formatR_1.14              
##  [61] BiocManager_1.30.21.1      fastmap_1.1.1             
##  [63] latticeExtra_0.6-30        fansi_1.0.4               
##  [65] digest_0.6.33              timechange_0.2.0          
##  [67] R6_2.5.1                   colorspace_2.1-0          
##  [69] jpeg_0.1-10                dichromat_2.0-0.1         
##  [71] biomaRt_2.56.1             RSQLite_2.3.1             
##  [73] utf8_1.2.3                 generics_0.1.3            
##  [75] data.table_1.14.8          prettyunits_1.1.1         
##  [77] InteractionSet_1.28.1      httr_1.4.6                
##  [79] htmlwidgets_1.6.2          S4Arrays_1.0.5            
##  [81] pkgconfig_2.0.3            gtable_0.3.3              
##  [83] blob_1.2.4                 ComplexHeatmap_2.16.0     
##  [85] htmltools_0.5.5            bookdown_0.34             
##  [87] ProtGenerics_1.32.0        clue_0.3-64               
##  [89] scales_1.2.1               png_0.1-8                 
##  [91] knitr_1.43                 lambda.r_1.2.4            
##  [93] rstudioapi_0.15.0          tzdb_0.4.0                
##  [95] rjson_0.2.21               nlme_3.1-162              
##  [97] checkmate_2.2.0            curl_5.0.1                
##  [99] cachem_1.0.8               GlobalOptions_0.1.2       
## [101] parallel_4.3.1             foreign_0.8-84            
## [103] AnnotationDbi_1.62.2       restfulr_0.0.15           
## [105] pillar_1.9.0               grid_4.3.1                
## [107] vctrs_0.6.3                dbplyr_2.3.3              
## [109] cluster_2.1.4              htmlTable_2.4.1           
## [111] evaluate_0.21              VennDiagram_1.7.3         
## [113] EnrichedHeatmap_1.30.0     GenomicFeatures_1.52.1    
## [115] cli_3.6.1                  locfit_1.5-9.8            
## [117] compiler_4.3.1             futile.options_1.0.1      
## [119] rlang_1.1.1                crayon_1.5.2              
## [121] labeling_0.4.2             interp_1.1-4              
## [123] stringi_1.7.12             deldir_1.0-9              
## [125] munsell_0.5.0              lazyeval_0.2.2            
## [127] Matrix_1.6-0               BSgenome_1.68.0           
## [129] hms_1.1.3                  bit64_4.0.5               
## [131] KEGGREST_1.40.0            highr_0.10                
## [133] igraph_1.5.0.1             broom_1.0.5               
## [135] memoise_2.0.1              bslib_0.5.0               
## [137] bit_4.0.5                  GenomicInteractions_1.34.0